AITopics | Álava Province

Dimensionality Reduction (DR) is widely used for visualizing high-dimensional data, often with the goal of revealing expected cluster structure. However, such a structure may not always appear in the projections. Existing DR quality metrics assess projection reliability (to some extent) or cluster structure quality, but do not explain why expected structures are missing. Visual Analytics solutions can help, but are often time-consuming due to the large hyperparameter space. This paper addresses this problem by leveraging a recent framework that divides the DR process into two phases: a relationship phase, where similarity relationships are modeled, and a mapping phase, where the data is projected accordingly. We introduce two supervised metrics, precision and recall, to evaluate the relationship phase. These metrics quantify how well the modeled relationships align with an expected cluster structure based on some set of labels representing this structure. We illustrate their application using t-SNE and UMAP, and validate the approach through various usage scenarios. Our approach can guide hyperparameter tuning, uncover projection artifacts, and determine if the expected structure is captured in the relationships, making the DR process faster and more reliable.

data mining, machine learning, projection, (17 more...)

arXiv.org Artificial Intelligence

2509.04222

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Add feedback

HtFLlib: A Comprehensive Heterogeneous Federated Learning Library and Benchmark

Zhang, Jianqing, Wu, Xinghao, Zhou, Yanbing, Sun, Xiaoting, Cai, Qiqi, Liu, Yang, Hua, Yang, Zheng, Zhenzhe, Cao, Jian, Yang, Qiang

arXiv.org Artificial IntelligenceJun-5-2025

As AI evolves, collaboration among heterogeneous models helps overcome data scarcity by enabling knowledge transfer across institutions and devices. Traditional Federated Learning (FL) only supports homogeneous models, limiting collaboration among clients with heterogeneous model architectures. To address this, Heterogeneous Federated Learning (HtFL) methods are developed to enable collaboration across diverse heterogeneous models while tackling the data heterogeneity issue at the same time. However, a comprehensive benchmark for standardized evaluation and analysis of the rapidly growing HtFL methods is lacking. Firstly, the highly varied datasets, model heterogeneity scenarios, and different method implementations become hurdles to making easy and fair comparisons among HtFL methods. Secondly, the effectiveness and robustness of HtFL methods are under-explored in various scenarios, such as the medical domain and sensor signal modality. To fill this gap, we introduce the first Heterogeneous Federated Learning Library (HtFLlib), an easy-to-use and extensible framework that integrates multiple datasets and model heterogeneity scenarios, offering a robust benchmark for research and practical applications. Specifically, HtFLlib integrates (1) 12 datasets spanning various domains, modalities, and data heterogeneity scenarios; (2) 40 model architectures, ranging from small to large, across three modalities; (3) a modularized and easy-to-extend HtFL codebase with implementations of 10 representative HtFL methods; and (4) systematic evaluations in terms of accuracy, convergence, computation costs, and communication costs. We emphasize the advantages and potential of state-of-the-art HtFL methods and hope that HtFLlib will catalyze advancing HtFL research and enable its broader applications. The code is released at https://github.com/TsingZ0/HtFLlib.

artificial intelligence, deep learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2506.03954

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > Canada > Ontario > Toronto (0.05)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area (0.93)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification

Tian, Tian, Miao, Chunyan, Qian, Hangwei

arXiv.org Artificial IntelligenceMay-30-2025

Contrastive learning has emerged as a competent approach for unsupervised representation learning. However, the design of an optimal augmentation strategy, although crucial for contrastive learning, is less explored for time series classification tasks. Existing predefined time-domain augmentation methods are primarily adopted from vision and are not specific to time series data. Consequently, this cross-modality incompatibility may distort the semantically relevant information of time series by introducing mismatched patterns into the data. To address this limitation, we present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact. To fully utilize the three properties, we propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks, which can be seamlessly integrated with contrastive learning frameworks in a plug-and-play manner. Specifically, FreRA automatically separates critical and unimportant frequency components. Accordingly, we propose semantic-aware Identity Modification and semantic-agnostic Self-adaptive Modification to protect semantically relevant information in the critical frequency components and infuse variance into the unimportant ones respectively. Theoretically, we prove that FreRA generates semantic-preserving views. Empirically, we conduct extensive experiments on two benchmark datasets, including UCR and UEA archives, as well as five large-scale datasets on diverse applications. FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks, demonstrating superior capabilities in contrastive representation learning and generalization in transfer learning scenarios across diverse datasets.

artificial intelligence, frequency component, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3711896.3736969

2505.23181

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > Ontario > Toronto (0.05)
Asia > Singapore (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition

Hossain, Md Meem, Han, The Anh, Ara, Safina Showkat, Shamszaman, Zia Ush

arXiv.org Artificial IntelligenceJan-14-2025

Human Activity Recognition (HAR) has gained significant importance with the growing use of sensor-equipped devices and large datasets. This paper evaluates the performance of three categories of models : classical machine learning, deep learning architectures, and Restricted Boltzmann Machines (RBMs) using five key benchmark datasets of HAR (UCI-HAR, OPPORTUNITY, PAMAP2, WISDM, and Berkeley MHAD). We assess various models, including Decision Trees, Random Forests, Convolutional Neural Networks (CNN), and Deep Belief Networks (DBNs), using metrics such as accuracy, precision, recall, and F1-score for a comprehensive comparison. The results show that CNN models offer superior performance across all datasets, especially on the Berkeley MHAD. Classical models like Random Forest do well on smaller datasets but face challenges with larger, more complex data. RBM-based models also show notable potential, particularly for feature learning. This paper offers a detailed comparison to help researchers choose the most suitable model for HAR tasks.

activity recognition, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2501.08471

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification

Han, Yudong, Wang, Haocong, Hu, Yupeng, Gong, Yongshun, Song, Xuemeng, Guan, Weili

arXiv.org Artificial IntelligenceDec-17-2024

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data: (1) they encode features by performing long-dependency ensemble averaging, which easily results in rank collapse and feature homogenization as the layer goes deeper; (2) they exhibit distinct priorities in fitting different frequency components contained in the time-series, inevitably leading to spectrum energy imbalance of encoded feature. To tackle these issues, we propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. Specifically, the CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation via adjusting the interaction pattern based on local content variations of time-series and learning to recalibrate the energy distribution across different frequency components. Moreover, a dual-constraint loss is devised to enhance the mutual optimization of vanilla decoder and our CBD. Extensive experimental results on ten time-series classification datasets show that our method nearly surpasses a bunch of baselines. Meanwhile, a series of explanatory results are showcased to sufficiently demystify the behaviors of our method.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2412.13232

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Maryland > Baltimore (0.04)
(9 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Extreme AutoML: Analysis of Classification, Regression, and NLP Performance

Ratner, Edward, Farmer, Elliot, Warner, Brandon, Douglas, Christopher, Lendasse, Amaury

arXiv.org Artificial IntelligenceDec-11-2024

Utilizing machine learning techniques has always required choosing hyperparameters. This is true whether one uses a classical technique such as a KNN or very modern neural networks such as Deep Learning. Though in many applications, hyperparameters are chosen by hand, automated methods have become increasingly more common. These automated methods have become collectively known as automated machine learning, or AutoML. Several automated selection algorithms have shown similar or improved performance over state-of-the-art methods. This breakthrough has led to the development of cloud-based services like Google AutoML, which is based on Deep Learning and is widely considered to be the industry leader in AutoML services. Extreme Learning Machines (ELMs) use a fundamentally different type of neural architecture, producing better results at a significantly discounted computational cost. We benchmark the Extreme AutoML technology against Google's AutoML using several popular classification data sets from the University of California at Irvine's (UCI) repository, and several other data sets, observing significant advantages for Extreme AutoML in accuracy, Jaccard Indices, the variance of Jaccard Indices across classes (i.e. class variance) and training times.

automl, dataset, extreme automl, (15 more...)

arXiv.org Artificial Intelligence

2412.07

Country: